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DETAILED ACTION 



1 . This communication is in response to Appeal Brief, filed 07/21/2006. 

2. Claims 1-31 are pending. Claims 1, 5, 21, 23 and 25 are independent. 

Response to Arguments 

3. Applicant's arguments with respect to claims 1-3, 5-18, 21-23 and 25-28 have 
been considered but are moot in view of the new ground(s) of rejection. 

Response to Amendment 

4. Applicant's request for reconsideration of the finality of the rejection of the last 
Office action is persuasive and, therefore, the finality of that action is withdrawn. 

Claim Rejections - 35 USC § 102 



5. The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that 
form the basis for the rejections under this section made in this Office action: 

A person shall be entitled to a patent unless - 

(b) the invention was patented or described in a printed publication in this or a foreign country or in public 
use or on sale in this country, more than one year prior to the date of application for patent in the United 
States. 
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6. Claim 23 is rejected under 35 U.S.C. 102(b) as being anticipated by Basson et al. 
(US 6,996,526 B2). 

As to claim Basson et al. teach: 

converting spoken words in an information stream to written text, the information 
stream containing audio information (transcribing speech from an input stream of 
speech, col. 2, lines 44-48). 

generating a separate encoded file for every word, wherein each encoded file 
shares a common time reference (storing a selected word from each of the different 
automatic speech recognition systems, and a time segment that, that word was 
recorded, col. 3, lines 5-10 and col. 4, lines 45-60). 

Claim Rejections - 35 USC § 103 

7. The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 1 02 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

8. Claims 1-3, 5-17, 20-22 and 25-28 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Charlesworth et al. (6,990,448) in view of D'hoore et al. (6,085,160). 
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As to claims 1 , 5, 21 and 25, Charlesworth et al. teach: 
identifying attributes including one or more types of accents and one or more 
types of human languages from a multi-party audio information stream (identifying 
attributes from a communication stream, involving more than one speakers, where for 
each language, the speaker's language, accent, dialect and phonetic set are identified, 
col. 9, lines 38-49); 

encoding each identified attribute from the audio information stream into a time 
ordered index, each of the identified attributes sharing a common time reference 
(storing the identified attributes identified in annotation data, within a header, (col. 9, 
lines 43-49), where the header includes a time index which associates the location of 
the blocks of annotation data within the memory, col. 5, lines 52-58); 

comparing results at approximately the same time to generate an integrated time 
ordered index of the identified attributes (identifying the language of the speaker, col. 5, 
lines 62-63, and creating time index associating the location of the blocks that have that 
attribute, col. 5, lines 50-67). 

A computer readable storage medium to store the software engine (a personal 
computer with programmable code stored within it, col. 3, lines 1-5). 

Charlesworth et al. do not explicitly teach comparing results from different human 
language models. 

However, D'hoore et al. teach comparing the results from different language 
models within a multi-language model to find the best phoneme combination (col. 8, 
lines 26-50). 
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Therefore, it would have been obvious to one of ordinary skill in the art at the 
time of the invention to combine the methods of Charlesworth et al. with the different 
language models within a multi-language model to reduce the error when matching a 
model to uttered speech during speech recognition, since a system with only a single 
language model is more prone to pronunciation mistakes during recognition, while a 
system with different language models within a multi-language model is able to detect 
correct pronunciation mistakes during automatic speech recognition, as taught by 
D'hoore et al. (col. 8, lines 37-57). 

As to claim 2, Charlesworth et al. teach comparing confidence ratings, (col. 6, 
lines 12-20). 

Charlesworth et al. do not teach the confidence ratings of different human 
languages. However, Charlesworth et al. teach the confidence ratings are based on a 
phoneme representation of the data, where it would be obvious to one of ordinary skill in 
the art at the time of the invention that when different human languages are used to fine 
the correct human language, the weights for the phoneme will be different for each of 
the languages. 

As to claim 3, Charlesworth et al. teach generating a transcript including each 
spoken word, wherein each spoken word shares the common time reference 
(generating a transcript of the spoken data, (col. 11, lines 25-30) generating the 
annotation data, where the annotation data contains a time index, col. 5, lines 52-58). 
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As to claims 6 and 22, Charlesworth et al. teach generating a query on one or 
more of the identified attributes in the time ordered index (generating a query based on 
a attribute, col. 6, lines 24-35). 

As to claims 7 and 18, Charlesworth et al. teach correlating a first identified 
attribute of the information stream with a second identified attribute having a similar time 
code (grouping attributes under one memory block with similar time codes, col. 5, lines 
45-57). 

As to claim 8, Charlesworth et al. teach the audio information stream comes from 
an unstructured information source (inputting conversational language with video, col. 9, 
lines 38-45). 

As to claim 9, Charlesworth et al. teach the audio information stream includes 
audio-visual data (inputting conversational language with video, col. 9, lines 38-45). 

As to claim 10, Charlesworth et al. teach the audio information stream includes 
speech data (inputting conversational language with video, col. 9, lines 38-45). 

As to claim 1 1 , Charlesworth et al. teach at lest one of the identified attributes 
further comprises a change of accent (the speakers are identified along with their 
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accents, col. 9, lines 38-45. Where it would be necessary that if the speakers had 
different accents a change of accent would be identified). 

As to claim 12, Charlesworth et al. teach at least one of the identified attributes 
further comprises a change of human language (the speakers are identified along with 
their language, col. 9, lines 38-45. Where it would be necessary that if the speakers 
had different languages a change of language would be identified). 

As to claim 1 3, Charlesworth et al. do not teach at least one of the identified 
attributes further comprises a discrete spoken word. 

However, D'hoore et al. teach recognizing words from the input (col. 2, lines 45- 

46). 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time of the invention to combine the methods of Charlesworth et al. with the word 
recognition of D'hoore et al. allowing the system to determine how well a user 
pronounced the word, to improve the ability for a user to the pronunciation of a 
language to be learned, as taught by D'hoore et al. (col. 8, lines 20-31). 

As to claim 14, Charlesworth et al. teach the identified attributes are encoded via 
extensible markup language (encoding the identified attributes to be sent over a data 
network, such as the internet, where it would be obvious to one of ordinary skill in the 
art at the time of the invention that the attributes are encoded via extensible markup 
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language since extensible markup language is a common encoding format for sending 
data over the internet). 

As to claims 15 and 26, Charlesworth et al. teach the time ordered index includes 
a start time and a duration in which each identified attribute was conveyed, (col. 5, lines 
48-57). 

As to claim 16, Charlesworth et al. teach the common time reference comprises 
a time indication (header with time indication, col. 5, lines 38-55). 

As to claim 17, Charlesworth et al. teach the common time reference comprises 
a frame count (header with time information and duration related to a video input, col. 5, 
lines 38-55). 

As to claim 20, Charlesworth et al. teach the integrated time ordered index 
includes data from different human language models (the time index includes 
information about the used vocabulary and language 5, lines 18-30). 

As to claim 27, Charlesworth et al. teach one or more attribute filters generate 
time ordered index of the audio information stream in real time (the attributes of the 
audio information are generated as the information is inputted into the system, col. 3, 
lines 5-11). 
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As to claim 28, Charlesworth et al. teach the audio information stream passes 
through the one or more attributes filers a single time (the audio data is processed as it 
is inputted into the system, and the current data is processed by the system, and then 
the next data is processed, col. 4, lines 33-42). 

Allowable Subject Matter 

9. Claims 4, 19, 24 and 29-31 are objected to as being dependent upon a rejected 
base claim, but would be allowable if rewritten in independent form including all of the 
limitations of the base claim and any intervening claims. 

10. The following is a statement of reasons for the indication of allowable subject 
matter: 

As to claim 4, Charlesworth et al. (the closest prior art of record) do not teach nor 
fairly suggest in combination with claim 1 triggering an even to occur up on an 
identification of unique voice characteristics of a speaker in less than five seconds. 

As to claim 19, Charlesworth et al. do not teach nor fairly suggest in combination 
with claim 18, the similar time code comprises the first identified attribute possessing a 
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start time approximately the same as the second identified attribute or an overlapping of 
the durations associated with the first identified and the second identified attribute. 

As to claim 24, Charlesworth et al. do not teach nor fairly suggest in combination 
with claim 23 generating a link to relevant material based upon the spoken words and 
synchronizing a display of the link in less than five seconds from analyzing the 
information stream. 

As to claim 29, Charlesworth et al. do not teach nor fairly suggest in combination 
with claim 25 a manipulation module to perform operation on a first set of attributes in 
order to manipulate a second set of attributes. 

As to claim 31, Charlesworth et al. do not teach nor fairly suggest in combination 
with claim 25 a triggering and synchronization module to dynamically trigger a link and 
synchronize the appearance of the link based upon a transcribed text from the 
information stream. 

Claim 30 would be allowable since it depends from claim 29, which has been 
indicated to obtain allowable subject matter. 



Conclusion 
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1 1 . The prior art made of record and not relied upon is considered pertinent to 
applicants disclosure. See PTO-892. 

12. Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Thomas E. Shortledge whose telephone number is 
(571 )272-7612. The examiner can normally be reached on M-F 8:00 - 4:30. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Richemond Dorvil can be reached on (571)272-7602. The fax phone 
number for the organization where this application or proceeding is assigned is 571- 
273-8300. 

Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a 
USPTO Customer Service Representative or access to the automated information 
system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 



TS 

10/06/06 
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